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ABSTRACT:  A  compiler  is  made  up  of  a  large  number  of  individual  components.  Some  or 
these  components  are  generated  from  format  specifications,  some  result  from  hand  adaptation  of 
general  algorithms,  and  some  are  standard  modules  from  a  library.  Configuration  control  is  a 
serious  problem  in  compiler  construction:  How  do  we  subdivide  the  compilation  task  into  com¬ 
ponents,  solve  them  individually,  and  then  re-integrate  the  solutions  into  a  consistent  product. 
We  have  successfully  used  the  Odin  object  manager  to  solve  this  problem  for  a  particular  com¬ 
piler  architecture.  The  resulting  system  illustrates  many  of  the  complexities  of  configuration  con¬ 
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1.  Introduction 

Tool  support  is  currently  available  for  the  majority  of  compiler  subtasks.  Existing  tools 
were  developed  individually,  and  often  little  thought  was  given  to  how  each  tool  might  interact 
with  tools  supporting  other  subtasks.  The  overall  result  is  that  the  same  information  must 
appear  in  specifications  for  several  different  tools,  and  the  modules  or  code  fragments  produced 
by  distinct  tools  do  not  work  together  smoothly.  Moreover,  the  code  generated  by  many  tools 
has  significantly  poorer  performance  than  the  equivalent  code  produced  by  hand. 

Despite  these  disadvantages,  generation  of  a  compiler  from  formal  specifications  is  impor¬ 
tant  because  those  specifications  can  be  machine-checked  for  consistency.  This  means  that, 
although  a  tool-generated  compiler  may  not  behave  as  desired,  it  will  not  crash.  In  effect,  the 
product  is  a  correct  compiler  for  a  language  or  machine  that  differs  from  the  one  intended.  Such 
errors  in  specification  are  generally  easier  to  find  and  correct  than  errors  due  to  inconsistent  data 
structures  or  incorrect  algorithms. 

The  disadvantages  of  compiler  generation  fall  into  two  broad  classes: 

•  Inadequacy  of  individual  tools. 

•  Complexity  of  the  generation  process. 

Problems  in  both  classes  must  be  solved  if  generation  is  to  become  a  common  method  for  con¬ 
structing  compilers.  Significant  progress  in  tool  improvement  has  been  made,  and  is  reported 
elsewhere1'2,3,4,5.  Managing  the  complexity  of  the  generation  process  is  a  configuration  control 
problem,  and  this  paper  presents  our  efforts  in  that  area. 

We  have  built  a  compiler  construction  system  called  Eli],  which  employs  off-the-shelf  tools 
that  generate  compiler  components  from  formal  specifications.  Eli  accepts  a  non-redundant 
specification  of  the  desired  compiler,  derives  appropriate  tool  inputs  from  that  specification, 
applies  the  tools,  and  then  combines  the  generated  modules  or  fragments  with  code  from  a 
library  to  produce  a  complete  compiler. 

We  are  not  concerned  here  with  the  tools  employed  by  Eli,  but  rather  with  the  manage¬ 
ment  of  the  specifications,  tools  and  partial  products.  Nevertheless,  we  begin  by  outlining  the 
process  of  compiler  construction,  listing  the  specification  mechanisms  normally  used  for  tool 
inputs  and  giving  references  to  more  detailed  discussions  of  their  implementation.  Then  we 
present  Eli's  configuration  control  problem,  and  indicate  the  facilities  needed  to  solve  it.  Finally, 
we  discuss  the  implementation  of  Eli  using  Odin*  and  RCS7. 

1  A  Brief  Outline  of  Compiler  Construction 

A  rather  stable  gross  design  for  a  compiler  has  evolved  from  experience  during  the  last 
twenty  years.  This  means  that  compilers  for  a  wide  variety  of  source  languages  and  target 
machines  can  be  described  by  supplying  “parameters"  to  a  single  design  model.  The  parameters 
consist,  for  the  most  part,  of  non-procedural  specifications  that  define  the  behavior  of  compila¬ 
tion  subtasks.  Many  different  descriptive  techniques  are  used  for  these  specifications,  as  illus¬ 
trated  by  Table  1. 

The  relationships  among  the  subtasks  listed  in  Table  1  can  be  described  non-procedurally 
by  an  attribute  grammar17,0.  Mechanisms  for  dealing  with  the  individual  subtasks  are  also  well- 
known;  references  for  the  various  descriptive  techniques  are  given  in  the  second  column,  and  the 
necessary  algorithms  can  be  found  in  most  texts  on  compiler  construction0,10.  The  third  column 
of  Table  1  indicates  how  a  compiler  designer  normally  obtains  a  description  of  the  corresponding 
subtask.  Several  of  the  subtasks,  such  as  the  mapping  of  identifiers  to  internal  representation 
and  the  analysis  of  the  source  language's  scope  rules,  are  carried  out  by  standard  algorithms  that 
can  be  used  unchanged.  These  algorithms  are  embodied  in  abstract  data  types  whose  operations 
are  invoked  at  appropriate  points  during  the  compilation  by  using  them  to  compute  attributes. 
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Table  1 

Subtasks  of  a  Typical  Compiler 


Subtask 

Descriptive  technique 

User  action 

Scanning 

Regular  expression8,3 

Adapt 

Identifier  table 

Abstract  data  type2 

— 

Denotation  conversion 

Abstract  data  types® 

Adapt 

Parsing 

Context-free  grammar10'11 

Create 

Scope  analysis 

Abstract  data  type12,13 

— 

Type  analysis 

Patterns14 

Create 

Storage  mapping 

Abstract  data  type® 

— 

Operation  mapping 

Patterns13 

Create 

Control  mapping 

Schemata14 

Create 

Peephole  optimization 

Register  transfers18 

Adapt 

Assembly 

Formats® 

Adapt 

Since  a  wide  variety  of  languages  have  similar  lexical  structures,  descriptions  of  the  scanner 
and  the  conversion  of  denotations  (constants  such  as  “1234")  can  usually  be  adapted  from 
previously-existing  descriptions.  Register  transfers  and  the  formats  that  govern  the  treatment  of 
the  target  machine  can  also  be  adapted  in  many  cases.  Descriptions  of  parsing  and  type  analysis 
depend  sensitively  upon  the  source  language,  and  the  descriptions  of  operation  and  control  map¬ 
ping  define  the  relationship  between  the  source  language  and  the  target  machine.  These  descrip¬ 
tions  almost  always  must  be  created  anew  for  each  compiler,  but  there  will  be  many  similarities 
with  existing  descriptions. 

Given  a  coherent  model  of  a  compiler  and  a  set  of  techniques  to  describe  the  various  com¬ 
ponents  of  that  model,  the  obvious  next  step  is  to  automate  the  construction  of  the  compiler 
itself1®  Tools  have  been  developed  for  generating  most  of  the  components  from  formal 
specifications  (see  the  references  quoted  in  Table  1).  Moreover,  programs  to  control  the  com¬ 
ponents’  interaction  can  be  generated  from  attribute  grammars20'21.  A  complete  compiler  can  be 
created  if  the  outputs  of  all  of  these  tools  can  be  combined  into  a  smoothly-functioning  product. 
A  recent  paper  describing  the  ACORN  system14,  developed  at  Brown  University  in  the  early 
1980’s,  gives  a  good  overview  of  the  problem  of  creating  a  complete  compiler. 

3.  The  Configuration  Control  Problem 

Eli’s  configuration  control  problem  can  be  briefly  characterized  as  follows: 

e  There  is  a  fixed  set  of  relationships  among  specifications,  tools  and  partial  products  that 
define  the  way  in  which  a  compiler  must  be  built.  In  the  terminology  of  DSEE22,  these  rela¬ 
tionships  constitute  the  tyotem  model. 

•  The  system  can  manufacture  a  large  number  of  products,  many  of  which  are  are  important 
but  rarely-used  diagnostic  aids. 

•  Consistency  can  only  be  checked  by  testing  the  specifications.  The  consistency  of  the  pro¬ 
ducts  must  be  guaranteed  by  guaranteeing  a  consistent  derivation;  no  test  for  consistency 
can  be  applied  to  the  product. 
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e  A  “version"  is  defined  by  a  particular  set  or  specifications  in  association  with  a  particular 
set  of  parameters  to  the  generation  process.  It  is  necessary  to  be  able  to  re-generate  a 
specific  version. 

e  Several  people  may  be  working  on  a  single  compiler.  In  general  they  will  be  working  on 
different  parts  of  the  specification,  and  each  will  want  to  have  access  to  the  latest  versions 
of  the  others'  work. 

The  system  model  for  Eli  can  be  described  by  a  derivation  graph  —  an  acyclic,  bipartite 
graph  whose  node  classes  are  objects  and  processes  respectively23.  Arcs  from  object  nodes  to  pro¬ 
cess  nodes  indicate  that  the  object  is  accessed  by  the  process,  while  arcs  from  process  nodes  to 
object  nodes  indicate  that  the  object  is  created  by  the  process.  Every  process  node  in  a  deriva¬ 
tion  graph  must  have  both  predecessors  and  successors;  an  object  node  must  have  either  prede¬ 
cessors  or  successors  or  both.  A  variety  of  data  formats  is  modeled  in  the  derivation  graph  by 
associating  a  type  with  each  object.  Two  objects  have  the  same  type  if  and  only  if  they  contain 
the  same  kind  of  information  and  have  identical  formats. 

Figure  2  is  a  derivation  graph  that  describes  a  simplified  version  of  Eli’s  system  model. 
Many  products  are  omitted,  standard  compilers  and  the  linker  are  not  explicit,  and  internal 
derivations  carried  out  by  the  tools  have  been  suppressed.  The  omitted  portions  of  the  graph  do 
not  introduce  any  new  configuration  control  problems.  Rectangular  boxes  represent  tools,  oval 
boxes  represent  specifications,  and  unboxed  text  represents  derived  objects.  Currently  Eli  does 
not  include  tools  that  process  peephole  optimization  and  assembly  specifications,  so  these  sub¬ 
tasks  are  implemented  by  abstract  data  types.  Also,  type  analysis  is  currently  handled  within  the 
attribute  grammar. 

The  dotted  area  at  the  top  of  Figure  2  delimits  the  part  of  the  derivation  not  under  Eli’s 
control.  CAGT2<  is  an  interactive  tool  that  establishes  a  relationship  between  the  concrete  syn¬ 
tax  used  to  describe  the  input  program  as  written  and  the  abstract  syntax  used  to  describe  its 
semantic  structure.  This  relationship  is  captured  by  an  additional  object  that  Eli  treats  as  an 
extra  specification.  During  the  derivation  of  the  compiler,  CAGT  is  used  in  a  “reverse”  batch 
mode  to  combine  the  concrete  syntax  with  the  connection  points  that  describe  the  abstract  tree 
structure.  The  result  is  a  parsing  grammar. 

The  content  of  a  derived  object  in  Figure  2  is  completely  determined  by  the  contents  of  the 
specifications  and  the  parameters  supplied  to  the  derivation.  Thus  it  is  theoretically  possible  to 
manufacture  a  product  directly  from  the  specifications  whenever  it  is  requested.  In  many  cases, 
however,  the  manufacturing  effort  can  be  reduced  by  re-using  derived  objects.  This  is  particu¬ 
larly  important  when  several  closely-related  products  are  manufactured  without  changing  the 
specifications  or  parameters. 

Most  configuration  control  systems  maintain  a  cache  of  derived  objects,  and  re-derive  an 
object  only  when  “necessary".  The  simplest  rule  for  determining  that  re-derivation  is  necessary 
is  that  some  predecessor  of  the  object  in  the  derivation  graph  has  been  changed  since  the  object 
was  last  derived25.  This  rule  sometimes  leads  to  more  re-derivations  than  necessary,  and  more 
complex  rules  have  been  proposed  in  special  cases26. 

Cache  placement  and  access  strongly  affect  the  properties  of  a  configuration  control  system. 
The  objects  in  the  cache  should  be  invisible  to  the  user  because  derived  objects  that  are  not  pro¬ 
ducts  are  of  no  concern.  This  is  a  simple  application  of  the  principle  of  modularity.  On  the 
other  hand,  the  cache  itself  should  be  a  visible  object  associated  with  a  particular  project.  Peo¬ 
ple  working  on  the  project  should  be  able  to  share  a  particular  cache,  thereby  making  the  results 
of  their  work  available  to  each  other. 

4.  Implementation 

Eli  is  implemented  as  a  set  of  off-the-shelf  compiler  construction  tools  managed  by  Odin6. 
Odin  accepts  a  request  for  a  certain  product,  carries  out  the  steps  necessary  to  obtain  that 
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Figure  2 

Simplified  Derivation  Graph  for  Eli 
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product,  and  delivers  the  result  to  the  requestor.  It  maintains  a  cache  into  which  it  stores  every 
derived  object  that  it  constructs.  If  a  requested  object  is  not  in  the  cache,  Odin  uses  the  deriva¬ 
tion  graph  to  decide  how  to  construct  that  object.  It  then  constructs  the  object  by  following  a 
path  in  the  derivation  graph,  invoking  the  necessary  processes  and  storing  intermediate  objects  in 
the  cache. 

The  objects  manipulated  by  Odin  are  normal  Unix  files.  Each  process  node  in  the  deriva¬ 
tion  graph  is  associated  with  a  Unix  shell  script.  When  a  process  is  to  be  run,  Odin  modifies  the 
shell  script  by  filling  in  the  names  of  the  files  that  represent  the  associated  objects,  and  then 
invokes  the  normal  shell  to  execute  the  modified  script. 

A  shell  variable  in  the  user's  environment  is  assumed  to  specify  the  path  name  of  a  direc¬ 
tory  containing  the  derivation  graph  and  subdirectories  for  the  process  nodes'  shell  scripts  and 
the  cache.  There  can,  of  course,  be  many  such  directories  that  link  to  the  same  derivation  graph 
and  shell  scripts  but  have  separate  caches.  Each  member  of  the  project  group  can  therefore  use 
a  private  cache  or  one  shared  among  a  set  of  colleagues,  as  appropriate.  In  any  case,  the 
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products  requested  are  guaranteed  to  be  consistent  with  the  specifications  used  to  derive  them. 

Because  Odin  objects  are  arbitrary  Unix  files,  and  because  processes  are  defined  by  arbi¬ 
trary  Unix  shell  scripts,  Odin  provides  the  necessary  flexibility  to  accommodate  off-the-shelf  tools. 
Odin  is  also  in  the  public  domain,  and  is  easily  transported  to  any  Unix  system.  Unix  itself 
simplifies  the  task  of  building  processes  that  split,  filter  and  merge  data. 

Users  interact  with  Eli  via  the  Odin  query  language.  The  derivation  graph  that  defines  Eli 
allows  the  user  to  derive  any  one  of  a  number  of  objects  from  a  specification,  and  these  deriva¬ 
tions  may  be  parameterized.  A  “query"  is  a  request  for  a  particular  derivation.  Each  query 
begins  by  stating  the  specification  from  which  the  derivation  is  to  start,  continues  with  a 
sequence  (possibly  empty)  of  keyword  parameters  and  their  values,  and  concludes  with  the  object 
to  be  derived.  Keyword  parameters  are  introduced  by  the  character  and  an  object  to  be 
derived  is  introduced  by  the  character  For  example,  a  Pascal  compiler  called  “mypc”  might 
be  created  from  a  specification  called  “pascal.g”  by  the  following  query: 

pascal.g  +  name—mypc  :  compiler 

Here  the  desired  object  is  of  type  “compiler",  and  “mypc"  is  to  be  used  as  the  value  of  the  key¬ 
word  parameter  “name"  in  the  derivation. 

Queries  can  be  obtained  from  files  or  presented  interactively,  and  there  is  a  history  mechan¬ 
ism  that  allows  one  to  modify  and  reuse  interactive  queries.  Each  query  must  specify  a  single 
object  from  which  a  product  is  to  be  derived.  This  object  may  be  a  list  of  file  names,  however. 
An  object  of  type  “.g”  is  a  list  of  the  file  names  that  contain  all  of  the  specifications  in  the  dotted 
part  of  Figure  2  except  the  abstract  data  types.  Abstract  data  types  are  introduced  into  the 
derivation  as  values  of  keyword  parameters,  because  the  set  of  abstract  data  types  varies  from 
one  compiler  to  the  next. 

We  use  KGS'  to  provide  version  control  for  the  specifications.  If  the  files  mentioned  by  the 
".g"  object  from  which  the  derivation  begins  are  not  available  in  the  working  directory,  they  are 
sought  in  a  subdirectory  of  the  working  directory  named  "RCS”.  When  no  parameters  have 
been  given,  the  latest  revision  is  accessed.  A  version  parameter  can  be  specified  in  the  query,  and 
will  be  passed  to  RCS  if  it  is  present.  Thus  all  of  the  version  naming  facilities  of  RCS  are  avail¬ 
able. 

An  Odin  query  corresponds  to  DSEE’s  configuration  thread22:  Objects  derived  from  queries 
with  different  parameter  values  are  considered  by  Odin  to  be  potentially  distinct.  (When  Odin 
derives  a  new  instance  of  an  object  it  compares  that  instance  with  any  existing  instance.  If  they 
are  identical,  it  marks  the  object  as  unchanged  by  the  derivation.)  Thus  a  query  that  contains  a 
version  parameter  can  be  saved  and  used  to  regenerate  a  particular  version  of  any  product  at 
any  time. 

S.  Conclusions 

Eli  has  been  successful  in  improving  the  productivity  of  compiler  constructors.  One  gradu¬ 
ate  student  at  the  University  of  Colorado  constructed  a  complete  compiler  for  Whetstone 
ALGOL27  in  six  weeks.  He  had  previously  taken  the  graduate  compiler  construction  course 
(which  was  not  based  on  tools)  and  had  had  industrial  compiler  experience.  He  had  studied  the 
ALGOL  60  Report  in  a  graduate  programming  languages  course,  but  had  no  other  experience 
with  ALGOL.  His  compiler  did  not  follow  the  design  given  in  Randell  and  Russell’s  book,  since 
that  design  does  not  fit  the  Eli  model,  so  the  six  weeks  included  a  complete  redesign.  An  experi¬ 
enced  software  engineer,  who  had  worked  with  the  Whetstone  compiler  on  the  ICL  KDF9, 
estimated  that  he  could  carry  out  a  new  implementation  of  Randell  and  Russell’s  compiler  in 
twelve  weeks2*. 

Odin  was  an  appropriate  mechanism  to  use  in  constructing  Eli.  The  concept  of  a  derivation 
graph  captures  the  designer's  intuition  precisely,  making  it  easy  to  understand  and  modify. 
Odin's  implementation,  although  somewhat  obscure  in  certain  details,  is  flexible  enough  to 
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incorporate  off-the-shelf  components.  This  greatly  simplifies  the  development  of  the  system.  It 
also  increases  the  system's  lifetime,  because  individual  tools  can  be  replaced  by  better  technology 
without  seriously  disturbing  the  entire  environment. 
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